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1. Introduction 

Mathematicians' obsession with counting led to many interesting and 
far-fetched problems. These lectures are structured around a seemingly 
innocent counting problem: 

Problem 1.1 (Real root counting). Given a system f = (/i, . . . , /„) 
of real polynomial equations in n variables, count the number of real 
solutions. 

You can also find here a crash-course in Newton iteration. We will 
state and analyze a Newton iteration based 'inclusion-exclusion' algo- 
rithm to count (and find) roots of real polynomials. 

That algorithm was investigated in a sequence of three papers by 



Felipe Cucker, Teresa Krick, Mario Wschebor and myself (2008, 2009 



2012). Good numerical properties are proved in the first paper. For 
instance, the algorithm is tolerant to controlled rounding error. Instead 
of covering such technicalities, I will present a simplified version and 
focus on the main ideas. 



The interest of Problem 1.1 lies in the fact that it is complete for 



the complexity class #Pr over the BSS (Blum-Shub-Smale) compu- 



tation model over M. See Blum et al. (1998) for the BSS model of 



computation. The class #Pir was defined by Meer (2000) as the class 



of all functions / : M°° ^ {0, 1}°° U {oo} such that there exists a BSS 
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machine M working in polynomial time and a polynomial q satisfying 
/(y) = #{z e ]R«("'^^(y) : M(y,z) is an accepting computation.} 



We refer to [Biirgisser and Cucker (2006) for the proof of completeness 



and to Cucker et al. (2008) for references on the subject of counting 
zeros. 

Counting real polynomial roots in M" can be reduced to counting 
polynomial roots in S""*"^. Given a degree d polynomial /(xi, . . . ,x„), 
its homogenization is /'^°™°(a;o, . . . , Xn) = x^fi^xi/xQ, . . . , Xn/xo). 

Exercise 1.1 (Beware of infinity). Find an homogeneous polynomial 
g = giy ,u) oi degree 2 in + 2 variables such that 

#{x G : /i(x) = ■ ■ ■ = /„(x) = 0} + 1 = 

= ^#{(y, u) e §"+1 : jT^^'iy) =■■■ = /r'^°(y) = g{y, u) = 0}. 

Because of the exercise above, replacing n by n — 1, Problem |1.1| 
reduces to: 

Problem 1.2 (Real root counting on 5*"). Given a system f = 
(/i) • • • ; fn) of real homogeneous polynomial equations in + 1 vari- 
ables, count the number of solutions in S"". 

This course is organized as follows. We start by a review of alpha- 
theory. This theory originated with a couple of theorems proved by 
Steve Smale ( 1986[ ) and improved subsequently by several authors. It 



allows to guarantee (quantitatively) from the available data that New- 
ton iterations will converge quadratically to the solution of a system of 
equations. 

Then I will speak about the inclusion-exclusion algorithm. It uses 
crucially several results of alpha-theory. 

The complexity of the inclusion-exclusion algorithm depends upon 
a condition number. By endowing the input space with a probability 
distribution, one can speak of the expected value of the condition num- 
ber and of the expected running time. The final section is a review of 



the complexity analysis performed in Cucker et al. (2009) and Cucker 



et al. (2012) 



A warning: these lectures are informal. The model of computation 
is cloud computing. This means that we will allow for exponentially 
many parallel processors (essentially, BSS machines) at no additional 
cost. Moreover, we will be informal in the sense that we will assume 
that square roots and operator norms can be computed exactly in finite 
time. While this does not happen in the BSS model, those can be 
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approximated and all our algorithms can be rewritten as rigorous BSS 



algorithms at the cost of a harder complexity analysis (Cucker et al., 



2008). 



Exercise 1.2. What would happen if you could design a true polynomial 



time algorithm to solve Problem 1.2!" 



Acknowledgments. I would like to thank Teresa Krick, Felipe Cucker 
and Mike Shub for pointing out some mistakes in a previous version. 
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Part 1. Newton Iteration and Alpha theory 

2. Outline 

Let f be a mapping between Banach spaces. Newton Iteration is 
defined by 

A^(f,x) =x-Df(x)-if(x) 
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wherever -Df(x) exists and is bounded. Its only possible fixed points 
are those satisfying f(x) = 0. When f(x) = and -Df(x) is invertible, 
we say that x is a nondegenerate zero of f . 

It is well-known that Newton iteration is quadratically convergent 
in a neighborhood of a nondegenerate zero (. Indeed, A^(f , x.) — ( = 

DH{0{x-CY + .... 

There are two main approaches to quantify how fast is quadratic 
convergence. One of them, pioneered by Kantorovich (1949) assumes 
that the mapping f has a bounded second derivative, and that this 
bound is known. 

The other approach, developed by Smale ( 1985 , 1986 ) and described 
here, assumes that the mapping f is analytic. Then we will be able to 
estimate a neighborhood of quadratic convergence around a given zero 
(Theorem 4.2) or to certify an 'approximate root' (Theorem 5.3) from 
data that depends only on the value and derivatives of f at one point. 



A more general exposition on this subject may be found in Dedieu 
(1997bD , covering also overdetermined and undetermined polynomial 



systems. 



3. The gamma invariant 

Through this chapter, E and F are Banach spaces, D C E is open 
and f : E — >■ F is analytic. 

This means that if xq G E is in the domain of E, then there is p > 
with the property that the series 

(1) f (xo) + D/(xo)(x - xo) + DV(xo)(x - xo,x - xo) + ■ • • 

converges uniformly for ||x — Xo|| < p, and its limit is equal to f(x) 
(For more details about analytic functions between Banach spaces, see 



Nachbin ( p64l|l969| ). 

In order to abbreviate notations, we will write ([T]) as 

f (xo) + D/(xo)(x - xo) + 5^ i^DV(xo)(x - xo)'^ 

k>2 

where the exponent k means that x— xq appears k times as an argument 
to the preceding multi-linear operator. 

The maximum of such p will be called the radius of convergence. 
(It is oo when the series ([T]) is globally convergent). This terminology 
comes from univariate complex analysis. When E = C, the series will 
converge for all x G -B(xo, p) and diverge for all x ^ -B(xo, p). This is no 



more true in several complex variables, or Banach spaces (Exercise 4.1 ). 



NEWTON ITERATION, CONDITIONING AND ZERO COUNTING 5 

The norm of a fc-linear operator in Banach Spaces (such as the k-th 
derivative) is the operator norm, for instance 

||D'=f(xo)||E^F = sup ||D'=f(xo)(ui, . . . , Uk)\\w. 

l|ui||E=---=l|ufellE=l 

As long as there is no ambiguity, we drop the subscripts of the norm. 

Definition 3.1 (Smale's 7 invariant). Let f : I? C E — t- F be an 
analytic mapping between Banach spaces, and xq G P. When Df(xo) 
is invertible, define 

/p}f(xo)-W(xo)ii^^ 
7(f,xo) = sup 

k>2 V ^' 

Otherwise, set 7(f, xq) = 00. 

In the one variable setting, this can be compared to the radius of 
convergence p of f (x)/f' (xq), that satisfies 



p = limsup 

k>2 



|f(xo)-if('=)(xo) 



1 



More generally. 

Proposition 3.2. Let f:PCE— T-F&ea C°° map between Banach 
spaces, and xq G V. Then f is analytic in Xq if and only if, 7(/, Xq) is 
finite. The series 

(2) f (xo) + D/(xo)(x - xo) + 5^ ^DV(xo)(x - xo)'^ 

k>2 

is uniformly convergent for x G B{'Xq, p) for any p < l/7(f, Xq)). 
Proof of the if in Prop \3.^ The series 

D/(xo)-^f (xo) + (x - xo) + ^ iD/(xo)-iDV(xo)(x - xo)^ 

k>2 

is uniformly convergent in i?(xo,p) where 
p^^ < limsup 



k>2 



|Z}f(xo)-lD'=f(xo)ll^^ 



< limsup7(f, Xo) * 

k>2 

k—1 

= lim7(f,xo)~ 



k-l 



7(f,xo) 



□ 
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Before proving the only if part of Proposition 3^ we need to re- 
late the norm of a multi-linear map to the norm of the corresponding 
polynomial. 

L/emma 3.3. Let k>2. Let T : E'' ^ ¥ be k-Unear and symmetric. 
Let S : E — 7- F, S(x) = T(x, x, . . . , x) he the corresponding polynomial. 
Then, 

||T|| < e''-^ sup ||S(x)|| 

||x||<l 

Proof. The polarization formula for (real or complex) tensors is 



T(xi, ■ ■ ■ ,Xfc) 



1 / 



2m 



j = l,...,k 



. 1=1 



It is easily derived by expanding the expression inside parentheses. 
There will be 2'^k\ terms of the form 

ei---efeT(xi,X2,-- - ,Xfc) 

or its permutations. All other terms miss at least one variable (say x^). 
They cancel by summing for tj = ±1. 
It follows that when ||x|| < 1, 



T(xi, 



,Xfc) < — max 

kl <:j=±l 
j = l,...,fc 



s I ^i^i 
1=1 



< — sup ||S(x)|| 

||x||<l 

The Lemma follows from using Stirling's formula, 

k\ > y/2^k^e-^e^'^^'^^+^\ 

We obtain: 

ITIK' ^ 



e i2fc+i e sup ||S(x) 
\'2-nk J ||xi|<i 

Then we use the fact that k > 2, hence ^2'nk > e. 



□ 



Proof of Prop. 3. 2, only if part. Assume that the series ^ converges 
uniformly for ||x — xo|| < p. Without loss of generality assume that 
E = F and /^f (xq) = /. 
We claim that 

limsup sup ||-^D'=f(xo)u'=f < p-\ 

k>2 ||u||=l ^' 
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Indeed, assume that there is 5 > and infinitely many pairs {k, u) 
with ||uj|| = 1 and 

\\^D'f{^o)n'\\'/'>p-\l + 6). 

In that case, 

infinitely many times, and hence ^ does not converge uniformly on 
5(xo,p). 

Now, we can apply Lemma 3.3 to obtain: 

limsup||^D'^f(xo)ir/(''"^^ < elimsup sup ||-^D'=f(xo)u^||^ 

k>2 kl k>2 ||u||=l k\ 

< e hm p-^i+i/^'^-i)) 
= ep- 

and therefore ||^L)''/(xo)||^/^''"^^ is bounded. □ 
Exercise 3.1. Show the polarization formula for Hermitian product: 



4- -- ""^ 

e4 = l 



Explain why this is different from the one in Lemma 3.3 



Exercise 3.2. If one drops the uniform convergence hypothesis in the 
definition of analytic functions, what happens to Proposition 3.2? 

4. The 7-Theorems 

The following concept provides a good abstraction of quadratic con- 
vergence. 

Definition 4.1 (Approximate zero of the first kind). Let f : D C E — i- 
F be as above, with = 0. An approximate zero of the first 

kind associated to C is a point xq G V, such that 

(1) The sequence (x)j defined inductively by x^+i = A^(f, Xj) is 
well-defined (each Xj belongs to the domain of f and Df(xi) is 
invertible and bounded). 

(2) 

||x,-C|| <2-2'+i||xo-C||. 

The existence of approximate zeros of the first kind is not obvious, 
and requires a theorem. 
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Figure 1. y = ip{u) 

Theorem 4.2 (Smale). Let f : "D C E — )■ F 6e an analytic map between 
Banach spaces. Let ( be a nondegenerate zero oft. Assume that 

l^^'27(f,c); - 

Every E B is an approximate zero of the first kind associated to 
(. The constant (3 — V7)/2 is the smallest with that property. 

Before going further, we remind the reader of the foUowing fact. 

Lemma 4.3. Let d > 1 be integer, and let \t\ < 1. Then, 

1 f k + d - l\ ,k 



(1-tY ^\ d-1 



k>0 

Proof. Differentiate d — 1 times the two sides of the expression 1/(1 — 
t) = 1 + t + + ■ ■ ■ , and then divide both sides by — 1! □ 

Lemma 4.4. The function ip{u) = 1 — 4u + 2v? is decreasing and 
non-negative in [0, 1 — \/2/2], and satisfies: 

(3) 77T<1 /or«G [0,(5- v^)/4) 

(4) -^,<\ /or «G[0, (3 - v^)/2] . 



The proof of Lemma 4.4 is left to the reader (but see Figure [T]). 



Another useful result is: 



NEWTON ITERATION, CONDITIONING AND ZERO COUNTING 9 

Lemma 4.5. Let A be a n x n matrix. Assume \\A — I\\2 < 1. Then 
A has full rank and, for all y, 



1 + 111- HI, ^ "•^"■f"-^ ^ i-U-i\W 

Proof. By hypothesis, \\Ax\\ > for all x 7^ so that A has full rank. 
Let y = Ax. By triangular inequality, 

\\Ax\\ > \\x\\ -UA- I)x\\ >{1-\\{A- I)h)\\x\\. 

Also by triangular inequality, 

Pa;||<||x|| + ||(A-/)a;||<(l + ||(A-/)|h)||x||. 

□ 

The following Lemma will be needed: 
Lemma 4.6. Assume that u = ||x — y||7(f , x) < 1 — Then, 

a-u? 



pf(y)-^Df(x)|| < 



Proof. Expanding y 1— )■ Df (x)~^Z)f (y) around x, we obtain: 



k>2 



Rearranging terms and taking norms. Lemma 4.3 yields 



\Df{^)-'Df{y) - I\\ < i — - L 

(1 -7||y - x||)2 



By Lemma 4.5 we deduce that -Df(x) ^Df{y) is invertible, and 
(5) ||Df(y)-^Z^f(x)||< ^ 



l-||Df(x)-iDf(y)-/|| ^(n) 



□ 



Here is the method for proving Theorem 4^ and similar ones: first 
we study the convergence of Newton iteration applied to a 'universal' 
function. In this case, set 



-'5 

2,3 . 7^' 



h^{t) = t--ft- Yt = t- 



1 - 7t 
(See figure |2|. 

The function has a zero at t = 0, and 7(/i-y, 0) = 7. Then, we 
compare the convergence of Newton iteration applied to an arbitrary 
function to the convergence when applied to the universal function. 
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Figure 2. y = h^{t) 

Lemma 4.7. Assume that < Uq = 7^0 < Then the sequences 

ti+i = N{h^,ti) and Ui+i = , / , 
are well-defined for all i, liiiij^^oo ti = 0, and 

l^ol Uq ~ \ip{uo) 

Moreover, 



\t 

for all i if and only if uq < ■ 
Proof. We just compute 

Kit) 

th'^{t) - h^{t) 
N{h^,t) 



o| 



(1 - jty 
It' 



7V y (l-7t)2 

7t' 



i^iity 
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When uq < (jsj) implies that the sequence Ui is decreasing, and 

by induction 

Ui = 7|ti|. 



Moreover, 



Ui+l ( Ui \ Uq ^ f Up ^ I Ui 



Uo \UoJ 1p{Ui) \UoJ ip{uo) \Uo 
By induction. 



Ui ^ I Uq 



Uq \^iuo) 
This also implies that limtj = 0. 

When furthermore Mo < (3 — -\/7)/2, uo/ip{uo) < 1/2 by ^ hence 
Ui/uo < 2^^'+-*^. For the converse, if uq > (3 — a/7)/2, then 

M - ^0 > 1 
|to| ^{uq) 2' 

□ 



Before proceeding to the proof of Theorem 4.2, a remark is in order. 

Both Newton iteration and 7 are invariant with respect to translation 
and to linear changes of coordinates: let g(x) = Ai{'x — (), where A is 
a continuous and invertible linear operator from F to E. Then 

iV(g, X + C) = iV(f , x) + C and 7(g, x + C) = 7(f , x). 
Also, distances in E are invariant under translation. 

Proof of Th \4.^ Assume without loss of generality that ( = and 
Df{Q = I. Set 7 = 7(f, x). Mo = ||xo||7, and let and the sequence 



[Ui) be as in Lemma 4.7 
We will bound 

(6) ||Ar(f,x)|| = ||x- Df(x)-^f(x)|| < ||Df(x)-i||f(x) -Df(x)x||. 
The Taylor expansions of f and Df around are respectively: 

f(x)=x + 5^iD^T(0)x'= 

k>2 



and 

(7) z^f (x) = i+J2 fc^n (0)^'"' 

k>2 

Combining the two equations, above, we obtain: 

f (x) - (x)x = J2 ^D'm^'- 

k>2 
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Using Lemma 4.3 with d = 2, the rightmost term in ^ is bounded 

above by 

7l|x|p 



|f(x) - /^f(x)x|| < ^(A; - l)7'^"lx 



(l-7l|x| 



\2- 



Combining Lemma |4.6| and ([s]) in ([6]) , we deduce that 



||iV(f,x)||< ^"''1'' 



^(7||x||)- 

By induction, Ui < 7||xj||. When mq < (3 — \/7)/2, we obtain as in 
Lemma 14.71 that 

||xo|| ~ Uo ~ 

We have seen in Lemma 14.71 that the bound above fails for i = 1 
when Mo > (3 - v^)/2. □ 

Notice that in the proof above, 

Therefore, convergence is actually faster than predicted by the defi- 
nition of approximate zero. We proved actually a sharper result: 

Theorem 4.8. Let i : V CK ^ ¥ be an analytic map between Banach 
spaces. Let C, be a nondegenerate zero off. Let uq < (5 — vT7)/4. 
Assume that 



//xq G B, then the sequences 

Xi+i = iV(f,Xi) and Ui+i 



are well-defined for all i, and 



Cll ^ ^ / Mo 



|X0-C|| Uq \i){uo 



-2'+l 



Table [T] and Figure [3] show how fast Ui / uq decreases in terms of Uq 
and i. 

To conclude this section, we need to address an important issue for 
numerical computations. Whenever dealing with digital computers, it 
is convenient to perform calculations in floating point format. This 
means that each real number is stored as a mantissa (an integer. 
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1/32 


1/16 


1/10 


1/8 




1 


4.810 


3.599 


2.632 


2.870 


1.000 


2 


14.614 


11.169 


8.491 


6.997 


3.900 


3 


34.229 


26.339 


20.302 


16.988 


10.229 


4 


73.458 


56.679 


43.926 


36.977 


22.954 


5 


151.917 


117.358 


91.175 


76.954 


48.406 



Table 1. Values of —log2{ui/uQ) in function of uq and i. 




Figure 3. Values of log2{ui/uQ) in function of Uq for 
^ = 1,...,4. 



typically no more than 2^^ or 2^^) times an exponent. (The IEEE- 754 
standard for computer arithmetic (The Institute of Electrical and Elec- 



courses, see for instance Higham (2002, Ch.2)). 



tronics Engineers Inc, 2008) is taught at elementary numerical analysis 



By using floating point numbers, a huge gain of speed is obtained 
with regard to exact representation of, say, algebraic numbers. How- 
ever, computations are inexact (by a typical factor of 2~^^ or 2"^'^). 
Therefore, we need to consider inexact Newton iteration. An obvi- 



ous modification of the proof of Theorem 4.2 gives us the following 
statement: 

Theorem 4.9. Let f : D C E — F 5e an analytic map between Banach 
spaces. Let C be a nondegenerate zero off. Let 
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Assume that 
(1) 

(2) Xo G B, and the sequence Xj satisfies 

||x,+i-iV(f,x,)||7(f,C)<'5 

(3) The sequence Ui is defined inductively by 



Ui+i = + 5. 

1p{Ui) 

Then the sequences Ui and Xj are well-defined for all i, Xj G V, and 

Cll ^ Ui ^ + 1 o <^ 



< — < max 2 



||xo-C|| Uo V ' % 

Proof. By hypothesis, 



V^(mo) Mo 

so the sequence Ui is decreasing and positive. For short, let q = < 
1/4. By induction, 

Ui+i ^ mq / uA ^ _^ 6^ ^ 1 f Ui\^ ^ 6 



UO '^{Ui) \UoJ Uq 4 \UoJ Uo 

Assume that Ui/uQ < 2^'^'^^. In that case, 

^ < 2-''^' + -<max( 2-''^'+\ 2- 



Uq Uq \ Uq 

Assume now that 2~^'~^^ ,Ui/uo < 25 /uq. In that case, 

Ui+i 6 f 6 \ 26 ( 91+1,1 5 

< — I — + 1 1 < — = max ( 2-2 +\ 2 



Mo Mo v4mo J Uo \ Uq^ 

From now on we use the assumptions, notations and estimates of the 
proof of Theorem 4.2 Combining ([s]) and ([s]) in (|6|, we obtain again 
that 

||Ar(f,x)||< ^"''1'' 



^(7||x||)- 
This time, this means that 

2 II II 2 

||xi+i||7<5+||iV(f,x)||7<5+ ^ 



^(7||x||)- 

By induction that ||xj — Cll7(f ) C) < Ui and we are done. □ 
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Exercise 4.1. Consider the following series, defined in C^: 

oo 

g{x) = ^ ] XiX2- 

Compute its radius of convergence. What is its domain of absolute 
convergence ? 

Exercise 4.2. The objective of this exercise is to produce a non-optimal 
algorithm to approximate ^/y. In order to do that, consider the map- 
ping f{x) =x'^ -y. 

(1) Compute 7(/, x). 

(2) Show that for 1 < y < 4, xq = 1/2 + y/2 is an approximate 
zero of the first kind for x, associated to y. 

(3) Write down an algorithm to approximate ^/y up to relative 
accuracy 2^^^. 

Exercise 4.3. Let f be an analytic map between Banach spaces, and 
assume that C is a nondegenerate zero of f . 

(1) Write down the Taylor series of Df{()-^ (f (x) - f (C)). 

(2) Show that if f (x) = 0, then 

7(f,C)l|x-CII>l/2. 
This shows that two nondegenerate zeros cannot be at a distance less 



than l/27(f, ^). Results of this type appeared in Dedieu (1997a), but 



some of them were known before Malajovich (1993, Th.l6). 



5. Estimates from data at a point 



Theorem 4.2 guarantees quadratic convergence in a neighborhood of 
a known zero (. In practical situations, ( is not known. A major result 
in alpha-theory is the criterion to detect an approximate zero with just 
local information. We need to slightly modify the definition. 

Definition 5.1 (Approximate zero of the second kind). Let f : I) C 
E — 7- F be as above. An approximate zero of the second kind 

associated to ( eV, f(C) = 0, is a point xq G V, such that 

(1) The sequence (x)j defined inductively by Xj+i = A^(f, Xj) is 
well-defined (each Xj belongs to the domain of f and Df(xi) is 
invertible and bounded). 

(2) 

||xi+i - Xill < 2 ^'+^||xi - Xo||. 
(3) limi^oo Xj = C- 
For detecting approximate zeros of the second kind, we need: 
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Definition 5.2 (Smale's /3 and a invariants). 

/3(f,x) = ||Df(x)-if(x)|| and a(f,x) = /3(f , x)7(f , x). 

The (3 invariant can be interpreted as the size of the Newton step 
Ar(f,x) -X. 

Theorem 5.3 (Smale). Let f : V CK ^ ¥ be an analytic map between 
Banach spaces. Let 

a < Oq = ^ . 

Define 

1 + a — \/l — 6a + 1 — 3a; — — 6a + 

ro = ana ri = . 

4a 4a 

Let xq G "D 5e such that a(f, xq) < a and assume furthermore that 
S(xo,ro/3(f,xo)) C P. Then, 

(1) Xq is an approximate zero of the second kind, associated to some 
zero Q eT) of i. 

(2) Moreover, ||xo — C|| < ''^o/3(f, xq). 

(3) Let xi = A^(f,xo). Then ||xi - C|| < ri/3(f,xo). 

The constant ao is the largest possible with those properties. 



This theorem appeared in Smale (1986). The value for ao was found 
by Wang Xinghua Wang Xinghua (1993). Numerically, 

ao = 0.157, 670, 780, 786, 754, 587, 633, 942, 608, 019 ■■ ■ 

Other useful numerical bounds, under the hypotheses of the theorem, 
are: 

ro < 1.390, 388, 203 ■■ ■ and ri < 0.390, 388, 203 ■ • • . 



The proof of Theorem 5.3 follows from the same method as the one 
for Theorem 4^ We first define the 'worst' real function with respect 
to Newton iteration. Let us fix /3, 7 > 0. Define 

hp^{t) =l3-t+ -—— = j3-t + -ft^ + ^H^ + ■■■ . 
1 — 71 

We assume for the time being that a = (3'j < 3 — 2a/2 = 0.1715 ■ ■ ■ . 
This guarantees that /i/?^ has two distinct zeros Ci = '^"'""4"^ and (2 = 

"^"'"""^^ with of course A = (1 + a)^ — 8a. An useful expression is the 
product formula 

(a;-Ci)(a;-C2) 



(9) hfj^{x) 



7~i — X 
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Figure 4. y = hi3^{t). 

From ([9]), hp^ has also a pole at 7"^. We have always < Ci < C2 < 

The function /i^^ is, among the functions with /i'(0) = —1 and 
/3(/i,0) < (3 and 7(/i, 0) < 7, the one that has the first zero Ci fur- 
thest away from the origin. 

Proposition 5.4. Let /3,7 > 0, with a = /37 < 3 — 2\/2. let hp-y he as 
above. Define recursively to = and tj+i = N{hj3^,ti). then 

1 - 

(10) t. = Ci: 



9^ — 1 ' 



with 



Ci l + a-VA , C1-7C1C2 
rj = — = ^ ana q 



C2 1 + a + a/A C2 - 7C1C2 1 - a + a/a 

Proof. By differentiating ([o]), one obtains 

h'p,it) = hp,{t) (-^ + + 



.t - Ci t - C2 7-' - i 

and hence the Newton operator is 



N{h,3^, t) = t I I — r 



t-Ci ' i-C2 ' y-^-t 
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A tedious calculation shows that N{hi3^,t) is a rational function of 
degree 2. Hence, it is defined by 5 coefficients, or by 5 values. 

In order to solve the recurrence for ti, we change coordinates using 
a fractional linear transformation. As the Newton operator will have 
two attracting fixed points (Ci and (2), we will map those points to 
and 00 respectively. For convenience, we will map to = into yo = 1. 
Therefore, we set 

'5(t) = ^r— — — and S (y) = 

Let us look at the sequence yi = S{ti). By construction yo = 1, and 
subsequent values are given by the recurrence 

yi+, = S{N{hp„S-\y,))). 

It is an exercise to check that 

(11) Vi+i = qVi, 

Therefore we have yt = and equation (10) holds. □ 



Proposition 5.5. Under the conditions of Proposition \5.4\ is an 

approximate zero of the second kind for hp^ if and only if 

13-3a/T7 

a = P7 < - 



4 

Proof. Using the closed form for tj, we get: 

ni + l_i 2' — 1 

^i+l 9i+l_-| oi_1 

1 — rjq'^ ^ 1 — rjq-^ ^ 
2»-i {l-v){l-q'') 



1 



(I_^g2> + l-l)(l_^g2«-l) 

In the particular case 2 = 0, 

ti-to = ^—^ = P 
1 — rjq 

Hence 

y;. , 1 - t. 



ti+l ti ^ ,2* — 1 



/3 
with 

(l-r7)(l-w)(l-g2'^ 



{1 — q){l — r]q^'^^ — rjq^' 



Thus, Co = 1. The reader shall verify in Exercise 5.1 that Ci is a 
non-increasing sequence. Its limit is non-zero. 



NEWTON ITERATION, CONDITIONING AND ZERO COUNTING 19 

From the above, it is clear that is an approximate zero of the 
second kind if and only if q < 1/2. Now, if we clear denominators and 
rearrange terms in (1 + a — VA) /(I + a + ^/A) = 1/2, we obtain the 
second degree polynomial 

2a2 - 13a + 2 = 0. 



This has solutions (13 ± VT7) /2. When < a < ao = (13 - V17) /2, 
the polynomial values are positive and hence g < 1/2. □ 

Proof of Th \5.3\ Let /3 = /3(f, xq) and 7 = 7(f, xq). Let /i^^ and the 



sequence U be as in Proposition 5.4 By construction, ||xi — xo|| = (3 



ti — to. We use the following notations: 

A = /3(f,Xi) and 7^ = 7(f,Xi). 
Those will be compared to 

A = I3{h,3^,ti)) and 7^ = '^{hp^.U)). 
Induction hypothesis: < /3j and for all / > 2, 



|Df(x,)-iD'f(x,)l| < 



The initial case when z = holds by construction. So let us assume 
that the hypothesis holds for i. We will estimate 

(12) A+i < pf(x,+i)-iDf(x,)||||/^f(xO-'f(x,+i)|| 
and 

(13) 7i+i < ll^r(Xi+i) Ui(y.i)\\ . 

By construction, f (xj)+Z}f (xi)(xj+i— Xj) = 0. The Taylor expansion 
of f at X,- is therefore 



Df (x.)-f (x.,0 = y: 

k>2 

Passing to norms, 

IIDf(x,)-if(x,+i)ll<r^ 

The same argument shows that 

hp^{ti+i) _ (3{his^,tiY'^{hp^,ti) 
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From Lemma 4.6 



|Df(x,+i)-iDf(x,)|| < 



(l-A7^)' 



(14) 



Also, computing directly, 

h'^-yiU+i) (1-/37)' 



We established that 



^ /3f7.(l-A7.) ^ /3f7.(l-A7.) _a 
Now the second part of the induction hypothesis: 



Df (x,)-^/^'f (x,+o = 



1 Df(x,)-i/^'=+'f(x,)(x,+i-x,)' 



fc>0 



k + l 



Passing to norms and invoking the induction hypothesis. 



|Df(xO-^Z?'f(x,+i)|| 



and then using Lemma 4.6 and (14) 



|L'f(xi+i) D f(x,+i)|| < ^2^' 



A direct computation similar to (14) shows that 

h%^\u+i) _ (1 - j2 ^^'^^'^^^'^^^ 



kih'jt 



k>0 



and since the right-hand-terms of the last two equations are equal, the 
second part of the induction hypothesis proceeds. Dividing by /!, taking 
I — 1-th roots and maximizing over all /, we deduce that 7^ < 



Proposition 5.5 then implies that xq is an approximate zero. 



The second and third statement follow respectively from 
||xo-C|| </3o + /3i + --- = Ci 

and 

||xi-C|| </3i + /32 + --- = Ci-/3. 



□ 
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1/32 


1/16 


1/10 


1/8 


13-3v^ 


1 


4.854 


3.683 


2.744 


2.189 


1.357 


2 


14.472 


10.865 


7.945 


6.227 


3.767 


3 


33.700 


25.195 


18.220 


14.41 


7.874 


4 


72.157 


53.854 


38.767 


29.648 


15.881 


5 


149.71 


111.173 


79.861 


60.864 


31.881 


6 


302.899 


225.811 


162.49 


123.295 


63.881 



Table 2. Values of — /o(72(||xj — CWI in function of a 
and i. 



The same issues as in Theorem 4.2 arise. First of all, we actually 



proved a sharper statement. Namely, 

Theorem 5.6. Let f : "D C E — F 6e an analytic map between Banach 
spaces. Let 

a < 3 - 2^2. 

Define 

1 + a — Vl — 6a + a^ 
4a 

Let xq G "D &e such that a(f, xq) < a and assume furthermore that 
-B(xo, r/3(f , xq)) C V. Then, the sequence Xj+i = A^(f, Xj) is well de- 
fined, and there is a zero ( of f such that 

l|x. - Cll < , r/3(f,xo). 
for rj and q as in Proposition 5.4\ 



Table |2] and Figure [5] show how fast ||xj — C\\//3 decreases in terms 
of a and i. 

The final issue is robustness. There is no obvious modification of the 



proof of Theorem 5.3 to provide a nice statement, so we will rely on 



Theorem 14.91 indeed. 

Theorem 5.7. Let f : "D C E — )■ F 6e an analytic map between Banach 
spaces. Let 6, a and uO satisfy 



ra vl4 

< 25 < no = < 2 - ^ 

(1 — rajipyra) 2 



with r = i+a-Vi-6a+«^ ^ Assume that 



B = B(xo,2r/3(f,xo)) CP. 
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J = 4 



= 3 



i = 2 



2'r 

23. 



i = 1 



■S=Ml2-3x/2 



Figure 5. Values of —log2{\\^i — CII//5) in function of a 
for i = 1 to 6. 



(2) xq G B, and the sequence Xj satisfies 



|xi+i - A^(f,Xi 



'1 — ra)'ip{ra) 
(3) The sequence Ui is defined inductively by 



< 6 



Ui+l 



6. 



Then the sequences Ui and Xj are well-defined for all i, Xj G V, and 

- Cll . rUi fn-T+l o ^ 

T, TT < < r max 2 ^ , 2 — 



||Xi-Xo|| Mo V ^0, 

Numerically, ao = 0.074, 290 ■ ■ ■ satisfies the hypothesis of the The- 
orem. A version of this theorem (not as sharp, and another metric) 
appeared as Theorem 2 in Malajovich (1994| ). 

The following Lemma will be useful: 

Lemma 5.8. Assume that u = 7(f, x)||x — y|| < 1 — V2/2. Then, 
Proof. In order to estimate the higher derivatives, we expand: 



icf(x)-Z,'f (y) = ^ A + A Bf(x)-'fl-f (x)(y - x)' 



fc>0 



k + l 
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and by Lemma 4.3 for d = I + 1 



i||M(x)-'fl'f(y)|| < 



Combining with Lemma 4.6 



^-||Df(y)-^D'f(y)|| < 



Taking the / — 1-th power, 

7(f y) < ^^^'""^ 

□ 

Proof of Theorem\5.T[ We have necessarily a < 3 — 2\/2 or r is un- 



defined. Then (Theorem |5.6[ ) there is a zero of f with ||xo — C|| ^ 



rl3{f,xo). Then, Lemma 5.8 imphes that ||xo — C||7(f, C) ^ "^o- Now 
apply Theorem 4.9 

□ 

Exercise 5.1. The objective of this exercise is to show that Ci is non- 
increasing. 

(1) Show the following trivial lemma: If < s < a < 6, then 

a—s ^ a 
b-s — b- 

(2) Deduce that q < rj. 

(3) Prove that Ci+i/d < 1. 

Exercise 5.2. Show that 

> . l + a-y/A 1 
Ci7(Ci 



3 - a + a/A^ f i+«-Va ^ ■ 



Part 2. Inclusion and exclusion 

6. ECKART-YOUNG THEOREM 

The following classical theorem in linear algebra is known as the 
singular value decomposition (svd for short). 

Theorem 6.1. Let A : W ^ M™ (resp. C" C™; be linear. Then, 
there are ai > ■ ■ ■ > > 0, r < m,n, such that 

A = f/SK* 

with U G 0(m) (resp. U{m)), V G 0{n) (resp. U{n)) and Sj^ = cxj 
for i = j < r and otherwise. 
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It is due to Sylvester (real nx n matrices) and to Eckart and Young 



(1939) in the general case, now exercise 6.1 below. 

E is a m X n matrix. It is possible to rewrite this in an 'economical' 
formulation with S an r x r matrix, U and V orthogonal (resp. unitary) 
m X r and n x r matrices. The numbers ai, . . . ,ar are called singular 
values of A. They may be computed by extracting the positive square 
root of the non-zero eigenvalues of A* A or AA*, whatever matrix is 
smaller. The operator and Frobenius norm of A may be written in 
terms of the cr/s: 



r2 



\\A\\2 = (Ji \\A\\f = ^(T( + --- + a, 

The discussion and the results above hold when A is a linear operator 
between finite dimensional inner product spaces. It suffices to choose 



an orthonormal basis, and apply Theorem 6A_ to the corresponding 
matrix. 

When m = n = r, ||A^-'^||2 = a^. In this case, the condition 
number of A for linear solving is defined as 



k{A) = \\A\\4A-^\ 

The choice of norms is arbitrary, as long as operator and vector norms 
are consistent. Two canonical choices are 

K2iA) = WAhWA-^h and kd{A) = \\A\\f\\A-'\\2. 



The second choice was suggested by Demmel (1988). Using that 
definition he obtained bounds on the probability that a matrix is poorly 
conditioned. The exact probability distribution for the most usual 



probability measures in matrix space was computed in Edelman (1992 ). 



Assume that A(t)x(t) = h(t) is a family of problems and solutions 
depending smoothly on a parameter t. Differentiating implicitly, 

yix + Ax = b 

which amounts to 

i = A~^h - A^^ix. 
Passing to norms and to relative errors, we quickly obtain 

l|x|| , / \\A\\f ||b| 

This bounds the relative error in the solution x in terms of the rel- 
ative error in the coefficients. The usual paradigm in numerical linear 
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algebra dates from Turing (1948) and Wilkinson (1994). After the 
rounding-off during computation, we obtain the exact solution of a 
perturbed system. Bounds for the perturbation or backward error 
are found through line by line analysis of the algorithm. The output 
error or forward error is bounded by the backward error, times the 
condition number. 

Condition numbers provide therefore an important metric invariant 
for numerical analysis problems. A geometric interpretation in the case 
of linear equation solving is: 

Theorem 6.2. Let A he a nondegenerate square matrix. 

\\A^^\\2 = min 

det{A+B)=0 

In particular, this implies that 



Kd{A) = min 



det{A+B)=0 \\A\\f 

A pervading principle in the subject is: the inverse of the condi- 
tion number is related to the distance to the ill-posed prob- 
lems. 

It is possible to define the condition number for a full-rank non- 
square matrix by 

KniA) = \\A\\f a^m{m,n)iA)~'^ . 



Theorem 6.3. (Eckart and Young, 1936) Let A be an m x n matrix 
of rank r. Then, 

ar(A)^^ = min 

ariA+B)=0 

In particular, if r = min(m,n), 

Kd(A)-^ = min "^"^ 

ar(A+B)=0 



Exercise 6.1. Prove Theorem 6.1, Hint: let u, v, a such that Av = au 
with a maximal, = 1, ||f || = 1. What can you say about A|^x? 

Exercise 6.2. Prove Theorem 16.31 

Exercise 6.3. Assume furthermore that m < n. Show that the same 
interpretation for the condition number still holds, namely the norm of 
the perturbation of some solution is bounded by the condition number, 
times the perturbation of the input. 



' /o /i/2" 






/i/2 /o . 




Xi 
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7. The space of homogeneous polynomial systems 

We will denote by T-L^ the space of polynomials of degree d 'va. n + 1 
variables. This space can be assimilated to the space of symmetric 
d-linear forms. For instance, when d = 2, the polynomial 

/(Xo, Xi) = foxl + fiXoXi + f2xl = [xo Xi] 

can be assimilated to a symmetric bilinear form and can be represented 
by a matrix. In general, a homogeneous polynomial can be represented 
by a symmetric tensor 

/(x) — ^ ^ fa^O ' ' ' -^n ~ ^ ] Ti-^i^ i^Xi-^Xi^ 
\a\=d 0<ii,...,id<n 

where 

fa = ^ ] Ti^i^^^^i^. 

The canonical inner product for tensors is given by 
{S,T) = ^ ] Si-^i2...i^Ti-^^i2„,i^ 

0<ii,...,i^<n 

The same inner product for polynomials is written 

faQa 



|a|=d 



where f | = . f — , is the coefficient of (xn + ■ ■ ■ + Xn)*^ in x". 

Lemma 7.1. Lei Q &e an orthogonal n x n matrix, that is Q^Q = I . 
Then, 

{foQ,goQ) = {f,g) 

Exercise 7.1. Prove Lemma [7. II 

We say that the above inner product is invariant under orthogo- 
nal action. We will always assume this inner-product for T-L^. 

It is also important to notice that Ti^ is that it is a reproducing 
kernel space. Let 

ir,(x,y) = (x,y)'^. 

Then 

/(y) = (/(-),i^.(-,y)), 
Df{y)u={f{-),DyKd{;y)u), 

etc... 
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8. The condition number 

Now, let's denote by "H^ the space of systems of homogeneous poly- 
nomials of degree d = (lii, . . . , dn)- The condition number measures 
how does the solution of an equation depends upon the coefficients. 

Therefore, assume that both a polynomial system f G S{T-L^) and a 
point X e 5'(M"+^) depend upon a parameter t. Say, 

ft(xt) = 0. 

Differentiating, one gets 

L>ft(xOxt = -ft(xO 

so 



(15) ||x,||<||L>f,(xi)-i||||fi(x,)||. 

The normalized condition number is defined for f G and 
X e R"+^ as 



Mf,x) = ||f| 



dr'/'||x||-'^i+i 



In the special case f e S{'H^) and x e 



x|^ 



//(f,x) ^ 
Proposition 8.1. 



1/2 



-1 



(1) 7/" ft and x^ are pat/is in 3(1-1^) and S{'R."''^^) respectively, and 
ft(xt) = then 

||Xt|| < //(ft,Xt)||ft||. 

(2) Let X e 5(M"+i) &e /ixec?. Then the mapping 



n: ^ L(x^,M"), 

-1/2 



f ^ 



d7 



-1/2 



d. 



-1/2 



Df(x)|,X 



28 GREGORIO MALAJOVICH 

restricts to an isometry n^(^y-crTT)^ '■ (kervr)^ L(x-^ 

(3) Let f e S{n^) and x G Then, 

. ^ 1 

' min{||f — g|| : Dg(x)|x-L singular} 

(4) If furthermore f(x) = 0, 
/i(f,x) 



min{||f — g|| : g(x) = and Dg{x.)^-^± singular} 



Proof. Item 1 follows from (15). In order to prove item 2, let x G 
^^]^n+i^ be fixed and let f G Ti^. Assume that y _L x. We can write 
f (x + y ) as 

f (x + y) = f (x) + (x)|,xy + ^D^f (x)|,x(y - x,y - x) + ■ ■ ■ 

This suggests a decomposition of into terms that are 'constant', 
'linear' or 'higher order' at x. 

An orthonormal basis for Hi would be 

1 dK,X;^) 



. Vd duj 

where (ui, . . . , u„) is an orthonormal basis of x-*- and (ei, . . . , e^) is the 
canonical basis of M". 

In this basis, the projection of f in Hi is just 



1 a^d,(-,x) 



di 



1/2 



d, 



-1/2 



Df(x) 



Thus, the subspace Hi of Ti^ is isomorphic to the space of n x n 
matrices. Moreover, tt : 7/^ — ?■ Hi is an orthogonal projection. Items 
3 and 4 follow now easily from Theorem 6.3[ □ 

Exercise 8.1. Deduce that for all f G ^ x G /i(f,x) > y/n. 

We denote by p(x, y) = (xOy) the angular distance between x G S*" 
and y G 5". The following estimate is quite useful: 

Theorem 8.2. Let f , g G SCH^) and let x,y G S(M"+^). Let 
u = (max(ii)/i(f, g)p(x,y) and v = /i(f,x)||f - g||. 
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Then, 



1 



1 + u + V 



/i(f,x) < /i(g,y) < 



1 — u — V 



/x(f,x). 



Remark 8.3. Similar formulas appeared in Biirgisser and Cucker (2011) 



and Dedieu et al. (2012). The final form here appeared in Malajovich 
(2011[ ) and generalizes to the sparse condition number. 

Proof. Let i? be a rotation taking y to x. Then, /i(g, y) = /i(g o i?, x). 
Moreover, it is easy to check that ||goi? — g|| < (max (ij)p(x, y). Thus, 

/i(f,x)||f-goi?|| < (u + v). 

Now, notice that Proposition 8.1[ 3) implies: 



|f-goi?||. < 



1 



< 



The theorem follows by taking inverses. 

9. The inclusion theorem 



+ \\i-goR\\. 



□ 



For any x G S{'H^), we denote by Ax be the affine space x + x"*- and 
by Fx : Ax ^ M", X ^ f (x + X) the restriction of f to A^. Then Fx 
is an n-variate polynomial system of degree d. 



Lemma 9.1. (Shub and Smale, 1993) 



7(Fx,0)< 
Proof. For simplicity assume 

A = 



max 



^llf 



||p(f,x) 



1. Let k > 2 and 



k\ 



|DFx(0)-^D*^Fx(0)|| 



< 



1 

1 
A?! 



Df(x)p^^,D^f(x)|x. 



^f(x)pxiA 



< Mf,x) 



k\ 



A-iZ}'=f(x)|x. 
A-iD'=f(x)|xx|| 



Now, notice that 

\D%{^)\ = \{i,,D'K,X;^))\< 
< llftll 



sup \\D''KdX-,x.){ui, 

ui||=---=l|ufc||=l 

Ul,...,Ui.±X 
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where i^rf.(y,x) = (y,x)°'* is the reproducing kernel of "Hj^. Differenti- 
ating with respect to y, one obtains: 

^D'=irrf^(y,x)(ui,...,Ufc) = (^j^^ {y,xy-''{y,ui)---{y,uk). 

The norm of ^D^KdXy,^){'^i, ■ ■ ■ , ^k) (as a polynomial of y) can be 
computed using the reproducing kernel property. 

2 



D''KdX;^)in,,...,Uk] 



^D^KdX-,^){ui, Uk), ^D''Kd,{-,^){ui, ...,Uk) 



< 



1 dy dy (d\ . .^-k/ \ / \ 
Maxir---9^UJ<y'") (y,u,)...(y,u,) 

h (t) ^''^ t^''^' "^■^] 

di 
k 



It follows that 
1 



L>Fx(0)~^D'=Fx(0)|| </i(f,x)max 



Estimating — ^i'^ ^ ^^"^ using Exercise 



7(Fx,0)<— /i(f,x). 



Whenever the sequence (Xfc)fcgN defined by Xq = 0, X^+i 
A^(Fx,Xfc) converges, let X* = limXfc and define 

. x + X* 



□ 



X + X* 



G S 



n+l 



As in Theorem 5.3 define 



ro(a) 



1 + a — Vl — 6a + 
4a 



Let a* the smallest positive root of 

= ao(l - a=^ro(aJ)^ 



Numerically, a* > 0.116. (This is better than (Cucker et al., 2008)). 
Let B^ = {yeS^: p(x,y) < rj with = ro(a,)/i(f, x)||f (x)||. 
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Theorem 9.2. Let f G S{n^) and x e S"" be such that 
max 

Then, 

(1) a(F,0) < a,. 

(2) is an approximate zero of the second kind o/Fx, and in par- 
ticular f (Cx) = 0. 

(3) Cx e B^. 

(4) For any z G B^, Cz = Cx- 



Proof. (1) By Lemma 9.1 , 

a(Fx,0) < (maxrf,)'/V(f,x) ||Df (x);if (x) || < 

< (maxc/i)3/V(f,x)^||f(x)|| < a,. 



(2) Since a* < a, we can apply Theorem Ksjto Fx and 0. 

(3) Since is a zero of the second kind for Fx, 

Fx(X*)=f(||x + X*||Cx) = 
and hence by homogeneity f(Cx) = 0. 

(4) 

p(x,Cx) < tanp(x,Cx) < ro(a=,)/3(f,x) < ro(a=,)/i(f, x)||f (x) 



(5) By Theorem 8.2 



1 — (maxajj/i(i,xjp(x, zj 1 — a*ro(a*j 

and hence, as in item 1: 



□ 



Cucker et al. (2008 


). For other inclu 


in alpha-theory, see 


Giusti et al. (2007) 



10. The exclusion lemma 

Lemma 10.1. Let f G SiU^) and let x,y G 5" with p(x,y) < ^2. 
Then, 

l|f(x) -f(y)|| < max(rfi)p(x,y). 

In particular, let 5 = min(||f (x)||/ y/max((ij), a/2). If 7^ 0, then 
there is no zero of f in 

5(x,5) = {yG5"+^:p(x,y)<5}. 
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Proof. First of all, 

\Mx)-My)\ = m-),K,X;^)-K,^i;y))\ 

< \\fMK,X;^)-K,A;y)\\ 

< \\M\ VKa^ (x, x) + K,^ (y , y ) - 2K,^ (x, y ) 
= ||/,||y2Vl-cos(e)'^ 

with 6 = p(x, y). Since 9 < n < y/SO, we have always 

COS(^) = 1 - ^ _ 1 ^6 ^ . . . > 1 _ 1^2^ 

^ ^ 2 4! 6! 2 

The reader will check that for e < 1, (1 — e)'^ > 1 — de. Therefore, 
using 9 < l/\/2, 

|/.(x)-/.(y)| < \\MVd.9 

and 

||f(x)-f(y)|| < ^rm,x{d,)9. 

□ 

Part 3. The algorithm and its complexity 

11. Convexity and geometry Lemmas 

Definition 11.1. Let yi, . . . ,ys G 5" belong to the same hemisphere, 
that is (yj, z) > for a fixed z. The spherical convex hull of 
yi, . . . , ys is defined as 

SCH(y„...,y.) = ^ „^^y^ + --- + ^-y- ■A„...,A.>0 

|Aiyi H h A^y^ll 

and Ai + ■ ■ ■ + As = 1 . 

This is the same as the intersection of the sphere with the cone 
{Aiyi + ■ ■ ■ + A^ys : Ai, . . . , A^ > 0}. We will need the following con- 



vexity Lemma from Cucker et al. (2008): 



Lemma 11.2. Let yi,...,ys G 5'"' belong to the same hemisphere. 
Let ri,...,rs > and let B{yi,ri) = {x G 5"" : p{x,yi) < rj}. // 
nB{yi, n) ^ 0, then SCH(yi, . . . , y.) C UB{yi, n) . 

Exercise 11.1. Prove Lemma [11.21 above. 

For the root counting algorithm, we will need to define a mesh on 
the sphere. 
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Lemma 11.3. For every rj = 2^*, we can construct a set Cirf) C 
satisfying: 

(1) For all z G S", 3x G C{r]) such that p(z,x) < riy/n/2. 

(2) For all x G 5", /ei F = {y g ^(t]) : p(x,y) < v^r^}. T/ien 
xG SCH(y). 

(3) #C(r/) < 2n(l + 2*+!)". 

Proof. Just set 

C(r7) = 1^ : X G M"+\xir/-^ G Z, ||x|U = l| • 

This corresponds to dividing Q = {x : ||x||oo = 1} into n-cubes of side 
fj. The maximal distance in Q between a point Z G Q and a point X 
in the mesh is half of the diagonal, or r|^/n. Then 

p(Z/||Z||,X/||X||)<r/v/^. 

Now, let Y' be the set of points y G C{ri) such that the distance 

along Q between x/||x||oo and y/||y||oo is at most rj. Then clearly 
X G SCH(r')- Moreover, Y' CY. 

The last item is trivial. □ 



12. The counting algorithm 

Given f G SiTi^) and rj = 2^*, we construct a graph Q^] = 
as follows. Let 

A(f) = {x G S" : max4^V(f,x)lf(x)|| < a,} 



be the set of points satisfying the hypotheses of Theorem 9.2 The set 
of vertices of Qri is Vr, = Cirf) fl A(f). 

Recall that Let By^ = {y E S'^ : pi^, y) < Tx} with Tx = ro(a*)p(f , x) | 
The set of edges of Gr, is £^ = {(x, y) G x : Ex n 5y 7^ 0}. This 



graph is clearly constructible. Theorem 9^ implies that for any edge 
(x,y) G Sr,, Cx = Cy More generally. 

Lemma 12.1. The vertices of any connected component of Qi^r]) are 
approximate zeros associated to the same zero oft. Moreover, «/x, y 
belong to distinct connected components ofQ{r]), then Cx 7^ Cy 

The algorithm is as follows: 

Algorithm RootCount 
Input: f G S(n^) . 
Output: #C e 5" : f(C) = 0. 



^ 2-riog2(i/v^)i 



34 



GREGORIO MALAJOVICH 



Repeat 

Tj rj/2 . 

Let Ui, . . . ,Ur be the connected components of Qy 
Until VI < 2 < j < r, Vx vertex of Wj,Vy vertex of Uj, 

(16) p(x,y)>2r7v^. 
and Vx G C{7]) \ A{f) , 

(17) ||f(x)|| > ri^nm&^di/2. 
Return r . 

Theorem 12.2. If the algorithm RootConnt stops, thenr is the correct 
number of roots oft in S"". 

Proof of Th \12.S\ Suppose the algorithm stopped at a certain value of 
rj. As each connected component Ui determines a distinct and unique 
zero of f, it remains to prove that there are no zeros of f outside 

Therefore, assume by contradiction that there is C G S*" with f{() = 
and C ^ -Bx for any x G K;. 

Let Y be the set of y G C{r]) with piC^y) ^ vV^- 

If there is y G F with y ^ A{f) let 6 = ||f(y)||/Vmaxrfj. Equa- 
tion (17) guarantees that ri^fnj^ < 6. By construction, ri^pnll < a/2. 



Therefore, the exclusion lemma 10.1 guarantees that {{() 0, contra- 
diction. 



Therefore, we assume that Y C A{i). Equation (16) guarantees that 
Y <zUk for a same connected component of Therefore, flygy-By 3 ( 
is not empty. 



By Lemma [1L3|2), x G SCH(F). Lemma [IL2 says that 

SCH(r) C Uyey5y 
Thus, X G -By for some y, contradiction again. 

□ 



A consequence of Th 12.2 is that if the algorithm stops, one can 
obtain an approximate zeros of the second kind for each root of / by 
recovering one vertex for each connected component. 

13. Complexity 

We did not prove that algorithm RootCount stops. It actually stops 
almost surely, that is for input / outside a certain measure zero set. 
Define ^ 

/t(f,x) = , = 

v/Mf,x)-2 + ||f(x)p 
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and notice that 

K(f,x) < /i(f,x) and K(f,x) < ||f(x)||"\ 

Reciprocally, 

min(/x(f,x),||f(x)|ri) < y2ft(f,x). 
If f (x) = 0, then ^(f , x) = /i(f , x). 



Definition 13.1. The condition number for for Problem 1.2 (count- 
ing real zeros on the sphere) is 

K,(f) = maxK(f,x). 

XGS" 

Assume that f has no degenerate root. Then the denominator is 
bounded away from zero, and ^(f) is finite. We will prove later that 
the algorithm stops for K(f) finite. But before, we state and prove the 
condition number theorem to obtain some geometric intuition on 
nd). 



Theorem 13.2. (Cucker, Krick, Malajovich, and Wschebor, 2009) 



Let E« = {g e -H^ : 3C e : g(C) = and rk(Dg(C)) < n]. Let 
f e Sinl), f ^ E^. Then, 



miUggsK II f 



In particular, k.(J) > 1. 
Proof. It suffices to prove that 

fi;(f,x) 



mm ge^M ||t - g|| 

g(x)=0 
rk(Dg(x))<n 

We proceed as in the proof of Prop jS.lj We decompose 

71^ = Ho® Hi® H2®--- 

where Hq and Hi correspond to the constant and linear terms of y i— i- 
f (x + y). Let Ui, . . . , u„ be an orthonormal basis for x-*-. 
An orthonormal basis for Hq © Hi is 

1 dK,X;- 



Vd dn 



3 
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The projection of f in i^o © Hi is 



[(f(-),^<^.(-,x))] 



1 aA-rf^(-.x) 



1/2 



-1/2 



dn 



1/2 



Df(x) 



This is an orthogonal projection onto x 
Now, 



fi;(f,x) 



-2 



If fx) 



0"^ 



-1/2 



dn 



1/2 



Df(x) 



Again, we apply Th,6.3 



Lemma 13.3. Let (1,(2 be distinct roots oft in S*". Then, 

1 



p(Ci,C2)> 



max(i^^^fi:(f) 



Proof. 



IIC1-C2II > 



27(f,Ci 



by Ex, 4.3 



> 



> 



maxd^/'^fi{f,Ci] 



by Lem 9.1 



max(i^''^K(f) 
The Lemma follows. 
Lemma 13.4. Assume that 

1 

7] < 



because f(Ci) = 



2 max d^"^ ^/n^^{f 



■(1 - 2a^ro{a^)). 



Then (16) holds. 



Proof. Recall that x and y belong to Af, so that 
max(i^^^/x(f, x)^||f (x)|| < 
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and the same for y. In particular, the radius rx of i?x satisfies 

ro(a.V(f,x)||f(x)|| < < 

maxd/ /i(r,xj maxd/ fi;(i,xj 



By Lemma 13.3 and the triangle inequality, 
p(x,y) > p(Cx,Cy)-ro(«*)/x(f,x)||f(x)||-roK)Mf,y)||f(y)ll 
^ ,3/2 (l-2a.ro(a,))- 



Lemma 13.5. Let^^Af. Then, 

||f(x)||> ^. 

fi;(f, x)2 maxd/ 

Proof. Let x ^ Af , so that 

^3/2 

-Mf,xf||f(x)l|>a. 



2 

Recall that 

min(/i(f,x), ||f(x)|ri) < v^K(f,x) 
There are two possibilities. If /i(f, x) < A/2K(f, x), then 

llffx)|| > 



max(i^^^K(f, x)2 
Otherwise, 

iifrx)ii>^i — -> 



v^K(f,x) maxdf\(f,x)2' 

Now we can state the 'cloud complexity' theorem. 
Theorem 13.6. The algorithm RootCount will stop for 
1 / K(f) , , 



□ 



□ 



maxd^^^K(f)2 V 

t/iat zs, after 0(logfi;(f) + log max dj) iterations. The total number of 
evaluations of f and Df is 

2n{l + Amaxdf'^^/^K{^f)''. 
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That means that 2n(l + Amajcd/ y^/t(f)^)" processors in parallel 
can compute the root count in time 0(logK(f) + log max di) times a 
polynomial in n for the linear algebra. 

For people concerned with the overall computing cost, a price tag 
exponential in n is known as the curse of dimensionality. It usually 
plagues divide and conquer and Monte-Carlo algorithms. 

But the situation = 2 is already interesting. How efficiently can 
we count zeros of a system of polynomials on the 2-sphere? As the 
parallel and sequential running time depends upon it is useful to 
known more about the condition number. 



14. Probabilistic and smoothed analysis 

One possibility is to pick the input system f at random, and treat 
fi;(f) as a random variable. For instance, let f G Ti^ be random with 
Gaussian probability distribution 

^ ^-||/IIV2 lo/R 

(27r)di-«S/2 

The tail for the random variable ^(f) and the expected value of 
logfi;(f) can be bounded by 

Theorem 14.1. (Cucker, Krick, Malajovich, and Wschebor, 2012) 
Let f he as above. Assume that n > 3. Then, 

(i) For a > Ay/2 {max di)'^n'^^'^N^/ we have 

Prob(K(f) > a) < Kr,- ^ ^ — , 

where N = dim V.^, := 8(max difv^/^ N^/^n^l^ + 1 andV = ]\di. 
(ii) 

E(ln/€(f)) < lnir„ + {InKnf/'^ + (lnir„)-^/2 ^ ^ \^{2n). 

Notice as a consequence that the expected running time of RootCount 
is E(lnK(f)) G (9(nlnmax(ij). This is cloud computing time, of course. 



Average time analysis depends upon an arbitrary distribution. Spiel- 



man and Teng (2004[ ) suggested looking instead at a small random per- 



turbation for each given input. This is known as smoothed analysis. 

For a given f G S'('H^), we will consider the uniform distribution in 
the ball i?(f, arcsino") C S'('H^) where a is an arbitrary radius, and 
Riemannian metric on the sphere is assumed. The strange looking arc- 
sine comes from the fact that -B(f , arcsin cr) is the projection on the 
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sphere of the ball -B(f , cr) C 7/^- The reason for looking at the uni- 
form distribution for perturbations instead of Gaussian is the following 
result: 



Theorem 14.2. (Biirgisser, Cucker, and Lotz, 2008) Let S C be 
contained in a projective hypersurface H of degree at most D and let 
K : S^^^ —7- [1, oo] be given by 

K{f) 



miuggs ||f - g|| 

Then, for all a G (0, 1], 
sup EheB(f ,arcsina)c5^-i (In ^(h)) < 2 - 1) + 2 In D - In (T + 5.5. 

In the context of the root counting problem, the degree D of S = YF^ 
is bounded by n^(]^ cij)(max(ij). Therefore, 

Corollary 14.3. [ Cucker, Krick, Malajovich, and Wschebor, 2000^ 
sup EheB(f,arcsm<x)c5(w«)(lnK(/i)) < 2 ln(dim('H^) ) + 41n(n) 

fG5(W«) 

+21n(JJc/i) + lnl/a + 6. 



15. Conclusions 

We sketched the average time analysis and a smoothed analysis of 
an algorithm for real root counting and, incidentally, root finding. The 
same algorithm can also decide if a given polynomial system admits a 
root. 

Loosely speaking, deciding (resp. counting) roots of polynomial sys- 
tems are NP-complete (resp. #P complete) problems. The formal 
NP-complete and #P-complete problems refer to sparse polynomial 
systems. 

Our algorithm requires actually polynomial evaluations, so it can 
take advantage of the sparse structure. Moreover, the degree of the 
sparse discriminant is no more than the degree of the usual discrimi- 



nant. In that sense Corollary 14.3 is still valid. The running time of 



the algorithm is polynomial in n and in the dimension of the input 
space. Again, this is a massively parallel algorithm so the number of 
processors is exponential in n. 
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